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ABSTRACT 



In April 2000, the Digital Library Federation commissioned 
three reports to address broader concerns about digital collections in 
research libraries. This report synthesizes the nearly 10 years' experience 
that libraries have had digitizing items from their rare, special, and 
general collections, and making them available online. The report 
demonstrates that digitization programs work best where their role within a 
library's collection development strategy is clearly understood, and 
identifies several roles that such programs can play. The author muses about 
the extent to which digitally reformatted special and rare collections can 
actually support scholarly research, and looks at whether leading research 
libraries in particular might more usefully focus on digitizing general as 
opposed to special and rare collections. The report opens with points to 
consider in developing a sustainable strategy. The second section addresses 
identification, evaluation and selection, discussing polices, guidelines and 
best practices, and rationales for digitization. The third section focuses on 
institutional impacts and discusses treatment and disposition of source 
materials, scalability, intellectual control and data management, coordinated 
collection development, funding, preservation, and support of users. A final 
section addresses challenges in evaluating costs and benefits, and offers 
recommendations. (Contains 47 references.) (AEF) 
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Preface 

In January 2000, the Digital Library Federation (DLF) launched an informal 
survey to identify the major challenges confronting research libraries that use 
information technologies to fulfill their curatorial, scholarly, and cultural mis- 
sions. With astonishing unanimity of opinion and clarity of voice, respon- 
dents pointed to digital collection development as their single greatest chal- 
lenge. Whether the digital information came from a commercial publisher or 
from a digitization unit within the library, it seemed to exist under a cloud of 
profound and unsettling uncertainty. Would it be useful and useable in its 
present or intended form, or require additional work by catalogers, systems 
staff, or subject bibliographers? What new demands would its availability 
make on library reference staff? What level of continued investment would be 
necessary to ensure its accessibility on current hardware and software? 

The survey also revealed that leading research libraries had learned a 
great deal about their digital collections through experience. Though substan- 
tial, that learning had rarely been expressed outside the collection policies, 
working papers, and implementation guidelines that libraries create to coor- 
dinate and manage their collection development efforts. Accordingly, in April 
2000, the DLF commissioned three reports to address broader concerns about 
digital collections. They are: Building Sustainable Collections of Free Third-Party 
Web Resources , by Louis Pitschmann, Selection and Presentation of Commercially 
Available Electronic Resources: Issues and Practices , by Timothy Jewell, and the 
report before you. The reports mark a starting point for what we hope will 
emerge as an evolving publication series. 

Working to a common outline and based on the lessons of experience, the 
authors demonstrate how decisions taken by a library when acquiring (or cre- 
ating) electronic information influence how, at what cost, and by whom the 
information will be used, maintained, and supported. By assembling and re- 
viewing current practice, the reports aim where possible to document effec- 
tive practices. In most cases, they are able at least to articulate the strategic 
questions that libraries will want to address when planning their digital col- 
lections. 

In this report, Abby Smith synthesizes the nearly 10 years' experience 
that libraries have had digitizing items from their rare, special, and general 
collections, and making them available online. The learning she uncovers is 
distilled in and extended by several case studies conducted in leading digital 
libraries with very different digitization programs. Smith demonstrates that 
digitization programs work best where their role within a library's collection 
development strategy is clearly understood, and she identifies several roles 
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that such programs can play. Smith also asks a number of searching ques- 
tions. She muses about the extent to which digitally reformatted special and 
rare collections can actually support scholarly research. Probing further, she 
wonders whether leading research libraries in particular might more usefully 
focus on digitizing general as opposed to special and rare collections. In this 
way, they would make important holdings available in new ways while tak- 
ing a first step in avoiding costs associated with their redundant manage- 
ment. The report is consequently much more than a strategic guide for indi- 
vidual institutions; it is a route map that points important directions for the 
library community as a whole. 



Daniel Greenstein 

Director > Digital Library Federation 
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1. Introduction 



L ibraries have been digitizing collections for a decade or more. 
Their collective experience has produced a depth of technical 
expertise and a set of tested practices. That information is 
widely shared among digital library staffs and has been well report- 
ed at meetings and in publications. This ongoing experiment with 
representing research collections online has resulted in the codifica- 
tion of technical practices and the emergence of clear trends in selec- 
tion policies. This paper reviews existing selection practices in librar- 
ies, identifies selection policies and best practices where they exist, 
and discusses the long-term implications of the opportunities and 
constraints that shape digital-conversion programs. This is not a sys- 
tematic review of what all research libraries are doing, but an analy- 
sis of significant achievements that will make it possible to identify 
good practices and benchmarks for success. Every library, regardless 
of size or mission, will need to determine for itself how and when 
digitization will move from being an experiment to becoming a col- 
lection-development strategy that is well integrated into its daily 
practice. 

For purposes of analysis, this study looks primarily at a subset of 
"first-generation" digital libraries, that is, those that have been en- 
gaged in significant digitization projects for a while. However, the 
study also looks at a few libraries that are just beginning to develop 
digitization programs to see what approaches they have taken, in 
light of others' experience. Research was conducted by studying the 
Web sites of all Digital Library Federation (DLF) members as well as 
the sites of other libraries and research institutions engaged in put- 
ting collections online. More important for analytical purposes were 
the site visits made to selected libraries — the University of Michigan, 
Cornell University, the University of Virginia (UVA), The New York 
Public Library (NYPL), New York University (NYU), and the New- 
York Historical Society. In addition to the fact that some of the select- 
ed institutions are first-generation digital libraries and others are not, 
there are great differences in governance and funding among the li- 
braries surveyed. Some are in public universities, some are in private 
institutions, and some are independent of an academic institution. 
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These differences are reflected in the various approaches they take to 
selecting what to digitize, how to do so, and for whom. 

Each library was given a set of questions about selection criteria 
that constituted the framework for investigation, and each institution 
organized its responses individually. (The Library of Congress [LC], 
also included in this study, answered the questions in writing, and 
no site visit was made.) The questions begin with the selection pro- 
cess and proceed through the creation of metadata, decisions about 
access policies, and user support systems. 

1-1- Defining a Sustainable Strategy 

While the great majority of research libraries have undertaken digiti- 
zation projects of one type or another, only a few are developing full- 
scale digitization programs rather than focusing on discrete uses of 
digitization for specific purposes. How do the libraries that have un- 
dertaken full-scale efforts conceptualize the role of digitized collec- 
tions in providing collections and services to their core constituen- 
cies? What are they doing, or what have they determined must be 
done, to move from project-based conversion to programs that, 
whether large or small, have a well-defined role in the long-term 
goals of the library? 

This report works from the assumption that to be sustainable, a 
digitization program should have certain intrinsic features. It should 

• be integrated into the fabric of library services; 

• be focused primarily on achieving mission-related objectives; 

• be funded from predictable streams of allocation, be they external 
or internal; and 

• include a plan for the long-term maintenance of its assets. 

A sustainable digitization program, in other words, would be 
fully integrated into a library's traditional collection-development 
strategies. A digitization program need not be large and production 
oriented to be sustainable. The role of conversion can be significant 
and well thought out, even when the conversion program serves lim- 
ited purposes and has limited resources. 

Any assessment of what libraries have achieved so far must take 
into account two key factors common to sustainable collection devel- 
opment, be it of analog, digitized, or born-digital materials. These 
factors are 

• a strategic view of the role of collections in the service of research 
and teaching or other core institutional missions, and 

• life cycle planning for the collections, beginning with their identifi- 
cation and including acquisition, cataloging and preservation, and 
providing reference. 

A strategic view can be revealed in many cases not only by look- 
ing at how closely the results serve the mission but also at the deci- 
sion-making process itself — that is, who decides what to convert to 
serve which ends. When are the decisions made primarily by subject 
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specialists based on existing collection strengths, and when is the se- 
lection process shaped by curricular development and other faculty 
needs? If the latter, then by what process are the faculty involved 
and how are teaching and research tools developed to meet their 
needs? 

Ensuring long-term access to digital collections depends on care- 
ful life-cycle management. How does the library budget for not only 
the creation of the digital scans but also for the metadata, storage ca- 
pacity, preservation tools (e.g., refreshing, migration), and user sup- 
port — the sorts of things that are routinely budgeted for book acqui- 
sitions? How much of the program is supported by grant funding 
and how much by base funding? If the program is currently grant 
supported, what plans exist to make it self-sustaining? A sustainable 
digitization strategy may well include the creation of digital surro- 
gates that serve short-term needs and do not demand long-term sup- 
port. The crucial thing is to anticipate what support, if any, will be 
needed. 

Selecting materials for digitization is more complex than is se- 
lecting materials for the purchase or licensing of born-digital materi- 
als, because it involves expending resources for items that are al- 
ready in the library's collection rather than acquiring new ones. In 
theory, a library would choose to digitize existing collection items 
only if it could identify the value that is added by digitization and 
determine that the benefits outweigh the costs. But in practice, the 
research library community has, over the past decade, gone boldly 
forth with digitization projects not knowing how to measure their 
costs or benefits. Digitization technology and its costs are constantly 
changing; as a result, budgeting models that make comparisons be- 
tween libraries can be meaningless or downright misleading. Unlike 
selecting officials who decide the purchase or license of electronic 
resources, those responsible for digital conversion do not have a set 
of fixed prices for services and collections on offer. The only way for 
many libraries to get at the issue of cost is to undertake projects for 
their own sake, in the expectation that documentation of expendi- 
tures will yield some meaningful data. Libraries that have been able 
to secure funding for projects, document their activities and expendi- 
tures, and share that information with their colleagues have emerged 
as the leaders of the community, if only because of their policies to 
share their knowledge. Their experiences are more relevant for this 
report than are those of others who have embarked on fewer projects 
or who have failed to document and share their knowledge. 

The other unknown factor in this first decade has been the bene- 
fit — the potential of this technology to enhance teaching, research, 
lifelong learning, or any number of possible goals that digitization is 
intended to achieve. How could we know in advance how users oth- 
er than ourselves would adapt this technology? How could we con- 
ceptualize use of digitally reborn collections except by extrapolating 
what we know from the analog realm? Regrettably, most academic 
institutions, despite their clearly stated goals of improving or at least 
enhancing research and teaching, have done less than they might 
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have to gather meaningful data about the uses of digitized collec- 
tions. While this report does address issues of costs and benefits, it 
should be remembered that as a community, we still have insuffi- 
cient data on which to draw firm conclusions and base recommend- 
ed practices. 

This report aims to synthesize experiences in order to identify 
trends, accomplishments, and problems common to libraries and to 
many cultural institutions when they represent their collections on- 
line. A brief review of rationales for digitization is followed by a dis- 
cussion of the ways in which digitization affects an institution. What 
consequences, intended or not, result from selection decisions? What 
factors, such as funding, may set constraints on decision making? 
One of the chief factors influencing selection decisions is copyright. 
This topic will not be explored here in detail because of its complexi- 
ty, but considerations related to rights management are often fore- 
most in mind when librarians assess collections for digitization. At 
the program planning stages, copyright is often viewed from the 
point of view of risk management.! Eliminating materials that are or 
might be under copyright reduces the risk of infringement to zero. 
But how does that affect the concepts of completeness and fair use 
that traditionally guide library access policies? 



2. Identification, Evaluation, and Selection 



2.1. Policies, Guidelines, and Best Practices 

A great deal has been written on the subject of selection for digitiza- 
tion and on the management of conversion projects. Much of this lit- 
erature is published on the Web and has become de facto "best prac- 
tice," to the extent that many institutions applying for digitization 
grants use it to plan their projects and develop selection criteria. In 
addition to these guidelines, there are a number of reports about se- 
lection for digitization that range from project management hand- 
books and technical guides to imaging, to broad, nontechnical arti- 
cles aimed at those outside the library community who fund such 
programs.2 

Very few libraries have developed their own formal written poli- 
cies for conversion criteria. Those that do have such documents tend 
to refer to them as "guidelines." These documents tend to focus on 
technical aspects of selection and, even more, on project planning. 
When asked why they do not have a policy, most institutions reply 
that it is too early to formulate policies, that they have not gotten 
around to formulating them, or that the institution does not have 




1 Literature on copyright abounds; among the most useful in program planning 
is by Melissa Levine in Sitts 2000. 

2 See Research Libraries Group 1996; Digital Library Federation and Research 
Libraries Group 2000; Sitts 2000; Smith 1999; Gertz 2000; de Stefano 2001; Kenney 
and Rieger, 2000; Library of Congress National Digital Library Program 1997. 

12 



Strategies for Building Digitized Collections 



5 



written collection-development policies for other materials so it is 
unlikely to write them for digitized collection development. 

These documents almost always focus on the planning of digital 
projects or of various elements of a larger program, rather than on 
the rationale for digitization. The University of Michigan, for exam- 
ple, has a written policy that clearly aims to fit digitization into the 
context of traditional collection development. It states that "core 
questions" underlying digitization should be familiar to any research 
library collection specialist (University of Michigan 1999). These 
questions are as follows: 

• Is the content original and of substantial intellectual quality? 

• Is it useful in the short and/ or long term for research and instruc- 
tion? 

• Does it match campus programmatic priorities and library collect- 
ing interests? 

• Is the cost in line with the anticipated value? 

• Does the format match the research styles of anticipated users? 

• Does it advance the development of a meaningful organic collec- 
tion? 

These are fundamental collection-development criteria that as- 
sert the importance of the research value of source materials over 
technical considerations; however, they are quite general. The rest of 
Michigan's policy focuses not on how to select items for conversion 
but on how anticipated use of the digital surrogates should affect de- 
cisions about technical aspects of the conversion, markup, and pre- 
sentation online. 

Harvard's selection criteria offer far more detailed consider- 
ations than do those of Michigan (Hazen, Horrell, and Merrill-Old- 
ham 1998). In common with the Michigan criteria, the Harvard crite- 
ria focus largely on questions that come after the larger, 
"why-bother-to-digitize-this-rather-than-that" issues have already 
been answered. Creation of digital surrogates for preservation pur- 
poses is cited as one legitimate reason for selection, as are a number 
of considerations aimed not at preservation but solely at increasing 
access. (Sometimes digitization does both at once, as in the case of 
rare books or manuscripts.) 

The Harvard guidelines have been useful to many beyond Har- 
vard who are engaged in planning conversion projects because they 
present a matrix of decisions that face selectors and are available on 
the Web (Brancolini 2000). The authors begin with the issue of copy- 
right — whether or not the library has the right to reformat items and 
distribute them in limited or unlimited forms. They then ask a series 
of questions derived from essentially two points of departure: 

Source material. Does it have sufficient intellectual value to war- 
rant the costs? Can it withstand the scanning process? Would digiti- 
zation be likely to increase its use? Would the potential to link to oth- 
er digitized sources create a deeper intellectual resource? Would the 
materials be easier to use? 
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Audience : Who is the potential audience? How are they likely to 
use the surrogates? What metadata should be created to enhance 
use? 

The answers to these and similar questions should guide nearly 
all the technical questions related to scanning technique, navigation- 
al tools and networking potential, preservation strategy, and user 
support. 

The primary nontechnical criterion — research value — is a subjec- 
tive one and relies on many contingencies for interpretation. What 
does it mean to say that something has intrinsic research value? Do 
research libraries collect any items that do not have such value? 
Should we give priority to items that have research value today or to 
those that may have it tomorrow? What relationship does current 
demand have to intrinsic value? Because the answers to these ques- 
tions are subjective, the only things excluded under these selection 
criteria are items that are difficult to scan (for example, oversized 
maps) or things that are very boring or out of intellectual fashion. 
Interestingly, foreign language materials are nearly always excluded 
from consideration, even if they are of high research value, because 
of the limitations of optical character recognition (OCR) software and 
because they often have a limited number of users. There are digital 
projects that have converted valuable historical sources, from Egyp- 
tian papyri to medieval manuscripts, into image files (such as the 
Advanced Papyrological Information System [APIS] and the Digital 
Scriptorium). In general, however, the conversion of non-English lan- 
guage sources into searchable text continues to be rare. 

This high-level criterion of research value is also an intrinsic part 
of traditional collection-development policies. The difference is that 
in most libraries, the acquisition of monographs, to take an example, 
fits into a longstanding activity that has been well defined by prior 
practice. This practice governs the acquisition of new materials, that 
is, those that the library does not already hold. (The issue of how 
many copies is secondary to the decision to acquire the title.) Selec- 
tion of an item for digitization is reselection, and the criteria for its 
digitization, or repurposing, will be different from those for its acqui- 
sition. The meaning of research value will also differ, because the 
methods of research used for digital materials differ from those used 
for analog, and the types of materials that are mined — and how — are 
also fundamentally different. Several large digitization programs to- 
day are grounded in the belief that it is the nature of research itself 
that is "repurposed" by this technology, and it is often surprising to 
see which source material yields the greatest return when digitized. 

As one librarian said, the guidelines addressing selection that are 
used routinely, whether official or not, are by and large "project ori- 
ented." It would be a mistake to confuse what libraries are doing 
now with what libraries should and would do if "we understood 
what higher purpose digitization serves." While guidelines for tech- 
nical matters such as image capture and legal rights management are 
extremely useful and should be codified, formal collection-develop- 
ment policies are still a long way off. 
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2.2. Rationales for Digitization 

Libraries usually identify two reasons for digitization: to preserve 
analog collections, and to extend the reach of those collections. Most 
individual projects and full-scale programs serve a mix of both pur- 
poses. As librarians have learned from tackling the brittle book prob- 
lem through deacidification and reformatting, it is difficult and often 
pointless to separate preservation and access. When a library is seek- 
ing outside funding for digital conversion (apparently still the pri- 
mary source of funding for many libraries), it tends to cite as many 
possible benefits from conversion as possible. For this reason, preser- 
vation and access are usually mentioned in the same breath. None- 
theless, because it has been generally conceded that digital conver- 
sion is not as reliable for preservation purposes as is microfilm 
reformatting, it is worthwhile to consider what institutions are doing 
in terms of preservation per se. 

2 . 2 . 7 . Preservation 

22.1.1. Surrogates 

The use of scans made of rare, fragile, and unique materials — from 
prints and photographs to recorded sound and moving image — is 
universally acclaimed as an effective tool of preventive preservation. 
For materials that cannot withstand frequent handling or, because of 
their value or content, pose security risks, digitization has proved to 
be a boon. 

22.12. Replacements 

For paper-based items, librarians generally agree that digital scans 
are the preferred type of preservation surrogates. They are widely 
embraced by scholars and are preferred over microfilm. However, 
most librarians also assert that scanning in lieu of filming does not 
serve preservation purposes, because the expectation that we can mi- 
grate those scans into the future is simply not as great as is our con- 
viction that we can manage preservation microfilm over decades. 
There is a general hope that the problem of digital longevity will 
soon be resolved. In anticipation of that day, most libraries are creat- 
ing "preservation-quality 7 ' digital masters — scans that are rich 
enough to use for several different purposes and are created to obvi- 
ate the need to rescan the original. These masters are sometimes cre- 
ated together with preservation master microfilm. 

Only one institution, the University of Michigan, has a policy to 
scan brittle books and use the scans as replacements rather than as 
surrogates. The university has created a policy for the selection and 
treatment of these books, and it explicitly talks of digital replace- 
ments as a crucial strategy for collection management (University of 
Michigan 1999). This policy is based on the premise that books print- 
ed on acid paper have a limited life span and that, for those items 
with insignificant artifactual value, the library is not only rescuing 
the imperiled information but also making it more accessible by 
scanning in lieu of filming. (The preservation staff continues to mi- 
crofilm items identified by selectors for filming as well as to deacidi- 
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fy volumes that are at risk but not yet embrittled.) The focus of Mich- 
igan's digital program is the printed record, not special collections 
such as rare books and photographs, and digitization has been made 
a key collection-management tool for these holdings. Cornell also 
has incorporated digitization into collection management (that is, it 
is not for access alone), although its efforts are not as systematic as 
are those of Michigan. At Cornell there is a preference for digital re- 
placements of brittle materials with backup to computer output mi- 
crofilm (COM) or replacement hard copies made from digital scans. 
The Library of Congress has also begun implementing preservation 
strategies based on digitization in its project to digitize the nine- 
teenth-century journal, Garden and Forest. Most libraries, though, 
elide the issue of digitally replacing brittle materials because they 
scan chiefly items from special collections. 

For audiovisual materials, digital replacements appear to be in- 
evitable, although standards for archival-quality re-recording have 
yet to be established. Because the recording media used for sound 
and moving image demand regular, frequent, and ultimately de- 
structive reformatting, migrating onto digital media for preservation, 
as well as access, is acknowledged to be the only course to pursue for 
long-term maintenance. The University of California at Los Angeles 
(UCLA) and LC, both deeply engaged in audiovisual preservation, 
intend to digitize analog materials to provide long-term access. This 
does not mean that these institutions will dispose of the original ana- 
log source materials, only that the preservation strategy for these 
items will not be based on routine use of that analog source material. 

2.2.2. Access 

In nearly all research libraries, digitization is viewed as service of 
collections in another guise — one that provides enhanced functional- 
ity, convenience, some measure of preservation, aggregation of col- 
lections that are physically dispersed, and greatly expanded reach. 
Among all the strands of digitization activities at major research in- 
stitutions, there are essentially three models of collection develop- 
ment based on access: one that serves as outreach to various commu- 
nities; one that is designed to build collections; and one that is driven 
by a specific need, such as demand by a user or preservation surro- 
gates, or that is part of a larger effort to develop core infrastructure. 
All libraries engage in the first kind of access to one degree or anoth- 
er. Significant strategic differences are evident, however, in their ap- 
proaches to the choice between mounting large bodies of materials in 
the expectation of use versus collaborating with identified users to 
facilitate their data creation. 

222.1. Access for Outreach and Community Goals 

There will continue to be times when academic libraries create digi- 
tal surrogates of their analog holdings for reasons that are important 
to the home institution yet not directly related to teaching and re- 
search. Libraries will continue to be parts of larger communities that 
look to them for purposes that transcend the educational mission of 



O 

ERIC 






Strategies for Building Digitized Collections 



9 



the library per se. As custodians of invaluable institutional intellectu- 
al and cultural assets, libraries will always play crucial roles in fund 
raising, cultivating alumni allegiance, and public relations. 

Occasions for selective digitization projects include exhibitions, 
anniversaries (when archives or annual reports often get into the 
queue), a funding appeal (digitization as a condition of donation), 
and efforts to build institutional identity. Careful consideration needs 
to be given to what goes online for whatever purpose because, once 
a collection is online, it becomes part of the institutional identity. Im- 
age building is a critical and often undervalued part of ensuring the 
survival of the library and its host institution. As custodians of the 
intellectual and cultural treasures of a university, libraries have an 
obligation to share that public good to the advantage of the institu- 
tion. 

2.2.22. Building Collections 

The most common approach to digitization hinges on a collection- 
driven selection process in which a library decides to scan a set of 
materials identified by staff as having great research potential online. 
The terms "collection driven" and "use driven" are familiar from the 
preservation microfilming projects of the 1980s and 1990s. At that 
time, brittle books were conceptualized as a "national collection" 
that was held by a group of libraries, not just one. Each library could 
help preserve the national collection by filming a set of its holdings 
that were particularly strong, thereby avoiding duplication of effort 
by registering its filming activities and enhancing access to the en- 
dangered materials through the loan or duplication of microfilm. Li- 
brarians would select books not on the basis of their documented use 
(by checking circulation statistics or going with items that cross the 
circulation desk), but on the basis of whether they constituted a co- 
herent set of monographs (and occasionally serials) arranged by sub- 
ject matter or date of publication, or both. 

This method, so well-known to preservation experts and subject 
specialists from the many grant-supported microfilming projects of 
the past two decades, has been transferred to the digital realm with 
one interesting twist: It has been extended primarily not to the gener- 
al collections — monographs and serials — that are the heart of micro- 
filming projects, but to special collections — materials that are rare or 
archival in nature or that are in nonprint format. This means that a 
library will scan items that exist as a defined collection, either by for- 
mat (incunabula, daguerreotypes) or by genre of literature (antisla- 
very pamphlets, travel literature gathered as a discrete group and 
held as such in the rare book department, photographs of the Recon- 
struction-era South given to the library by a donor and known under 
the donor's name, Sanborn fire insurance maps, and so forth). 

Within each group, libraries may attempt to be as comprehensive 
as possible in putting items from the collection online to simulate the 
comprehensive or coherent nature of the source collection. Examples 
of such collection-driven digital collections are the Making of Ameri- 
ca (a subject and time period), Saganet (a set of special collection 
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items held by certain repositories that relate to Icelandic sagas), the 
Sam Nunn Papers project at Emory University the Hoagy Carmicha- 
el site at Indiana University and the Scenery Collection at the Uni- 
versity of Minnesota. 

In a survey of selection strategies at 25 research libraries, Paula 
de Stefano found that "the most popular approach to selecting col- 
lections for digital conversion is a subject-and-date parameter ap- 
proach applied, by and large, to special collections, with little regard 
for use, faculty recommendations, scholarly input, editorial boards, 
or curriculum" (de Stefano 2001, 67). A recent analysis of 99 research 
libraries and their special collections done by the Association of Re- 
search Libraries (ARL) revealed that virtually all have been digitiz- 
ing some of their special collections. The list of digitized collections 
submitted to ARL during this survey reveals just how eclectic these 
scanning projects are, a result fully consonant with de Stefano' s find- 
ings about selection policies (Panitch 2001, 99, 116-123). This ap- 
proach has often slyly been referred to as the "field of dreams" meth- 
od of collection development ("build it and they will come"), 
implying a certain naive hopefulness on the part of the selectors but 
also hinting at the elements of surprise and serendipity we see in the 
digital realm. 

2223 . Meeting Use and bifrastructure Needs 

Some libraries have decided that they will digitize collections only in 
response to explicit user-driven needs. As a state-supported institu- 
tion, UVA has developed access projects that serve state and regional 
needs. These projects are based primarily on UVA's special collec- 
tions holdings and are similar in scope and purpose to the type de- 
scribed above. The University of Virginia also has several digital con- 
version initiatives that are explicitly user driven, and these programs 
exist both in the library and elsewhere on campus. At the Institute 
for Advanced Technology in the Humanities (IATH), an academic 
center located in the library but administratively separate, scholars 
develop deep and deeply interpreted and edited digital objects that 
are, by any other name, publications. Examples include projects on 
the writers William Blake, Dante Gabriel Rossetti, and Mark Twain, 
as well as the Valley of the Shadow Civil War site. Within the library 
is the Electronic Text Center, where staff members choose to encode 
humanities texts that they put up without the interpretive apparatus 
of the IATH objects. These are analogous to traditional library mate- 
rials that are made available for others to interpret; the difference is 
that encoded text is far more complicated a creature than is the OCR 
text that other libraries are creating. The Electronic Text Center is not 
so much responding to faculty or student demand as it is being driv- 
en by a technology. Exploring the potential of various encoding 
schemes is part of its agenda. 

Under its Digital Library Initiative and with funds provided by 
the university, Harvard University libraries are concentrating on 
building an infrastructure to support born-digital materials first and 
foremost, rather than on building collections of digital surrogates of 
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existing collections. Where the libraries have converted items, the 
criteria for selection have to do with user needs, not general collec- 
tion building. And while the holdings of the more than 100 reposito- 
ries in the university certainly comprise a rich collection of cultural 
heritage. Harvard will attempt to serve the Harvard community, not 
the larger community (Flecker 2000). "While in many instances the 
digital conversion of retrospective materials already in the Universi- 
ty's collections can increase accessibility and add functionality and 
value to existing scholarly resources, it is strategically much more 
important that the library begin to deal with the increasing flood of 
materials created and delivered solely in digital format." Although 
$5 million of the $12 million allocated by the university is for content 
development, so far the majority of content development comprises 
conversion-for-access purposes. Slated for review are the collections 
that have been mounted so far. "One specific issue being discussed is 
the randomness of the areas covered by the content projects. Since 
these depend upon the initiative of individuals, it is no surprise that 
the inventory of projects undertaken is spotty, and that there are no- 
table gaps .... It is also possible that specific projects will be com- 
missioned to address strategic topics" (Flecker 2000). However, the 
gaps Flecker refers to are not content per se — specific subjects that 
would complement one another — but content that demands different 
types of digital format — for example, encoded text, video, or sound 
recordings. This is a technical criterion, of course, independent of 
collection development, and is fully concordant with the purposes 
that Flecker identifies the initiative is to serve. 

At New York University (NYU), the focus is on the user as part 
of a plan that allocates relatively modest resources for digitization. 
New York University presents collections of cultural significance 
through online exhibitions and other modes of Web outreach rather 
than engaging in full collection conversion. This library has decided 
to concentrate on working with faculty and graduate students to de- 
velop digital objects designed to enhance teaching and research 
through its Studio for Digital Projects and Research and its Faculty 
Technology Center. Like Harvard, NYU is giving priority to the de- 
velopment of an infrastructure to deal with born-digital materials 
and, in an institution with extensive programs and collections in the 
arts and performance studies, on multimedia archives converted to 
digital form for presentation. New York University plans to give 
grants to faculty members to develop teaching and research tools. 
However, the library staff is now putting much of its effort toward 
preparing for the time, seen to be imminent, when the demands of 
born-digital materials will obviate any initiative to create large col- 
lections of digital surrogates. 

The Cornell libraries have tried both collection- and user-driven 
approaches to selection. In several instances, staff members have be- 
gun with expressed interests of faculty, say for teaching, and have 
developed digital collections based on those interests. In each case, 
however, library staffs have expanded their brief and have augment- 
ed faculty choices with related materials. A faculty member's inter- 
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ests are usually fairly circumscribed, and librarians select a good 
deal of additional materials on a topic, such as Renaissance art, to 
add depth to a selection. As a result, a selection of materials becomes 
a collection and has a wider scope of content. Research librarians are 
used to thinking of collections as being useful to the extent that they 
offer comprehensiveness or depth. Scholars, on the other hand, take 
comprehensiveness for granted and concentrate on making choices 
and discriminations among collection items in order to build a case 
for an interpretation. These two views of collections are complemen- 
tary; however, when it comes to selection for digitization, they create 
the most difficult choices facing libraries in digitization programs. 
Selection is an " either/ or" proposition. It seldom tolerates "both/ 
and" solutions. Those historians who are working on Gutenberg-e 
projects sponsored by the American Historical Association are begin- 
ning to encounter the limitations that librarians live with every day. 
When faced with the opportunity not only to write their text for elec- 
tronic distribution but also to present their sources through digital 
surrogates, the historians find themselves facing dilemmas familiar 
to digital collection builders everywhere. How much of the source 
material is enough to represent the base from which an argument 
was built? How can one select materials that give a sense of the 
scope of the original from which the scholar made his or her choices? 
And why is digitization of even a few core files so expensive? 

Many of the scholar-designed projects may be coherent digital 
objects in themselves, but they would fail the librarian's test of com- 
prehensiveness as a collection. Indeed, one could say that the value 
added by the scholar lies precisely in its selectivity. Some of those 
projects, most notably the Valley of the Shadow Civil War site, at- 
tempt to bring together materials that complement and enrich each 
other but do not try to comprehend the great universe of materials 
that could be considered complementary. These new digital collec- 
tions are somewhat analogous to published anthologies of primary 
sources, carefully selected by an individual or an editorial team to 
serve heuristic purposes or to provide supporting evidence for an 
interpretation. Other projects driven by scholar selection, such as the 
"Fantastic" collection of witchcraft and French Revolutionary source 
materials at Cornell or the Women Writers' Resources Project at Em- 
ory University, do not claim to be comprehensive, but serve as point- 
ers to the collection by presenting a representative sampling of it. Yet 
others, such as the Blake Archive or the APIS serve primarily to col- 
locate items to form a new virtual collection that then serves as a 
new paradigm of critical edition. 

General Collections. To date, very few libraries have digitized 
significant series of books and periodicals, whereas, as the ARL spe- 
cial collections survey shows, a great many libraries are digitizing 
their special collections. Several reasons for this selection strategy are 
commonly given, and others can be inferred. 

There has been a preference to digitize visual resources over tex- 
tual sources, in part because they work so well online and in part be- 




20 



Strategies for Building Digitized Collections 



13 



cause visual resources do not require the additional expense of OCR 
or text encoding that add value to textual materials. (Creating meta- 
data for visual resources that are not well indexed, however, often 
ends up being more expensive.) Printed sources do not require addi- 
tional features, of course, but simple page images of non-rare texts 
do not provide the enhanced access that most researchers want from 
digital text. Nearly all selection criteria call for a specific additional 
functionality, such as browsing and searching, from text conversions. 

A number of commercial interests are working with publishers 
or libraries to provide digitized versions of texts that have a potential 
market; one example is Early English Books Online (EEBO). For the 
core retrospective scholarly literature that is in high demand, there 
are commercial and nonprofit providers, such as Questia and JSTOR, 
ready to run the copyright gauntlet that libraries are ill equipped to 
handle efficiently. ArtSTOR, a production-scale digitization program 
initiated in the summer of 2001 and modeled on JSTOR, will develop 
a database of art and architecture images based on analysis of curric- 
ular needs for higher education. It is the copyright issue associated 
with these sources that has largely precluded individual institutions 
from digitizing their slide libraries and other visual resources and 
contributing them to a database that would build contextual mass. 

In addition to these considerations, there is a sense that a library 
can help build institutional identity by digitizing materials that are 
unpublished or not commonly held. This can be important in en- 
couraging alumni loyalty or in recruiting students. This assump- 
tion — that special collections build institutional identity and general 
collections do not — is actually challenged by the success of the Mak- 
ing of America (MOA) projects at Michigan and Cornell and by the 
texts encoded at UVA. These institutions have achieved considerable 
renown for their collections of monographs and periodicals. Yet it is 
reasonable to assume that such massive digitization programs are 
not easily replicated by institutions with smaller digital infrastruc- 
ture. For those institutions, special collections allow a smaller-scale 
approach to developing a Web presence. 

But, as the MOA projects highlight, two of the challenges faced 
by libraries mounting print publications are how much is too little 
and how much is perhaps too much. The sense that textual items 
need to exist in a significant or critical mass online stems in part from 
the fact that books and magazines do not have quite the same cultur- 
al frisson as do Jefferson holographs or Brady daguerreotypes. Few 
libraries are in the position to mount the kind of large-scale digitiza- 
tion projects that can result in a critical mass of text online, and for 
reasons to be discussed below, they do not enter into the kind of co- 
operative arrangement that is at the heart of MOA. 

Special Collections. In 1995, when several academic libraries 
were working together to mount the text-based Americana in the 
Making of America project, LC inaugurated a digitization program, 
American Memory, based on its Americana special collections. The 
program was ambitious (they targeted five million images in five 
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years) and has been influential largely because of the extensive and 
easily adapted documentation that the library has mounted on its 
Web site and the well-publicized redistribution grants that it gave 
under its LC/ Ameritech funding. The requirements for those grants 
were based on Library of Congress experience, and they have signifi- 
cantly influenced the requirements of other funding agencies, includ- 
ing the Institute for Museum and Library Services (IMLS). The only 
other library that has similar collecting policies and a similar gover- 
nance and funding structure is the NYPL, and the digital program it 
plans to implement over the next few years bears remarkable resem- 
blance to that of LC: both have an ambitious time frame, focus on 
special collections, and intend to make access to the general public as 
high a priority as service to scholars. They share, in other words, the 
same strategic view of digitization — one that is well in line with the 
realities of their roles as public institutions and with their audiences, 
collection strengths, and governance structures. 

Many libraries that are not similar to LC or NYPL have also used 
this strategy. In the early stages of its digitization program, Indiana 
University reports that it used the LC/ Ameritech Competition pro- 
posal outline to assess the merits of collections for digitization. This 
led to a canvass of the university's libraries for "their most signifi- 
cant collections, preferably ones in the public domain or with Indi- 
ana University-held copyrights." Then, with (special collection) can- 
didates in hand, the library examined them for what it identified as 
the basic criteria: "the copyright status of the collection; its size; its 
popularity; its use; its physical condition; the formats included in the 
collection . . . and the existence of electronic finding aids" (Brancolini 
2000 ). 

Libraries that depend on outside funding — the great majority of 
libraries digitizing collections — often assert that it is easier to raise 
funds if they propose to digitize special collections because they are 
more interesting and have greater appeal to the funding agencies. 
This hypothesis is untested — although MO A, which comprises gen- 
eral collections, has received major grant funds, so perhaps the hy- 
pothesis has indeed been tested and proven invalid. Nevertheless, 
this notion of the funding bodies' predilection for special collections 
continues to be persuasive. 

While academic libraries have many reasons for deciding to digi- 
tize special collections, the rationales of the two public institutions 
merit special consideration, in large part because they are so differ- 
ent from those of academic libraries. (Because of these differences, 
however, taking them as a model should be done with eyes wide 
open). The NYPL and LC base their selection decisions on their un- 
derstanding that they are not libraries within a specific academic 
community, with faculty and students to set priorities. Rather, they 
serve a broad and often faceless community — the public. Their goal 
is to make available things that both scholars and a broader audience 
will find interesting. They also endeavor to make their collections 
accessible to those with modem connections and low bandwidth, of- 
ten limiting factors for the delivery of cartographic and audio mate- 




22 



Strategies for Building Digitized Collections 



15 



rials, among others. Because their primary audience is not academic, 
they have no curricular or educational demands to meet. They can 
focus exclusively on their mission as cultural institutions. Moreover, 
as libraries that have rich cultural heritage collections held in the 
public trust, they feel obligated to make those unique, rare, or fragile 
materials that do not circulate available to patrons who are unable to 
come to their reading rooms. Their strategic goal is cultural enrich- 
ment of the public. 

None of the research libraries with comparable deep collections 
claims cultural enrichment of the public as an explicit goal. And yet, 
as de Stefano points out, there are academic libraries that are mount- 
ing special collections of broad public appeal that is not matched to 
curricular needs (de Stefano 2001, 67). She cautions that, "It is only a 
matter of time until the question emerges as to how long the parent 
institutions will be satisfied with supporting the costly conversion of 
their library's materials to improve access for narrowly defined audi- 
ences that may not even be their primary local constituents." This 
strategy may be supportable as long as the overhead of serving a sec- 
ondary audience beyond the campus is low. But anxiety about this 
issue was repeatedly expressed in the conversations held during re- 
search for this paper. 



3. Institutional Impacts 



Libraries face several issues as a consequence of undertaking digital 
conversion. Some of these issues are constraints that may limit the 
potential of digitization to enhance research and teaching. They must 
be identified and explored to assess accurately the costs and benefits 
of digitization. Other issues may not be constraining as such, but 
may put new pressures on libraries to expend further resources. 

They, too, must be scrutinized and managed if digital programs are 
to be developed cost-effectively and with the greatest possible bene- 
fit to the collections and their users. 

3.1. Treatment and Disposition of Source Materials 

Selecting for scanning must include an assessment of the source's 
physical condition and readiness for the camera. For those items that 
are rare, unique, fragile, or otherwise of artifactual value, prepara- 
tion for scanning usually demands the attention of conservators, if 
only to confirm that the item will not be harmed by the camera. 

Some type of prescanning treatment may also be required, from 
strengthening or mending paper to removing environmental soil. In 
the case of certain media, such as medieval manuscripts or daguerre- 
otypes, the institution may need to call upon the services of a con- 
sultant or vendor with special expertise. This involves more time and 
resources (e.g., for writing contracts and checking items that return 
from a vendor) and may divert resources normally dedicated to oth- 
er conservation work. 
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The disposition of scanned source materials that are not unique 
or rare is a challenging subject that most libraries are just beginning 
to confront. When digitization becomes an acceptable, if not pre- 
ferred, alternative to microfilm for preservation reformatting and 
those items that can be networked are, what criteria will libraries use 
to decide what to keep on campus, what to send to remote storage, 
and what to discard? For materials that are rare or unique that ques- 
tion should not arise. But what about back journals that will be avail- 
able from a database such as JSTOR, or American imprints that the 
University of Michigan and Cornell have scanned and made avail- 
able without restrictions on their MO A site? The library community 
never reached a consensus about this issue for microfilming. The his- 
torical inattention to originals on the assumption that some institu- 
tion would retain an original has led to avoidable material losses and 
to a serious public relations problem. 

Recent trends in scholarship have forced libraries to reexamine 
old assumptions about the value of original serials and mono- 
graphs — items that are not traditionally prized for their artifactual 
value (Council on Library and Information Resources 2001). It is im- 
portant that research libraries find cost-effective means of preserving 
originals and making them readily accessible. This does not mean 
that all libraries should keep multiple copies of items that research- 
ers prefer to use electronically. But 20 years from now, when many 
scholars may well prefer accessing these materials electronically over 
retrieving them from library shelves, will the library community 
have developed a collective strategy for preserving a defined num- 
ber of originals for access purposes and reducing the redundancy of 
print collections? How will researchers who wish to have access to 
originals be able to find out where they are and how they can be 
viewed? Current plans to register digitized items in a nationally co- 
ordinated database include consideration of noting the location of a 
hard copy available for access, and the fate of originals is beginning 
to get the attention that researchers demand (Digital Library Federa- 
tion 2001). 

For certain very fragile media, such as lacquer sound discs or 
nitrate film, the original or master should seldom or never be used 
for access purposes. Service of those collections should always be 
done on reformatted (access) media. However, that is an expensive 
proposition for any library, and there is great resistance to push the 
costs of preservation transfer onto the user. Regrettably, a great num- 
ber of recorded sound and moving image resources continue to be 
played back using the original. Until digitization is an affordable op- 
tion for access to these media, their preservation will remain at very 
high risk. 

3.2. Scalability 

Digitizing either general or special collections presents challenges 
regarding size: How many items from any given collection will be 
sufficient to create added value? "Critical mass" is one selection cri- 
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terion that shows up in nearly all the written guidelines for selection 
and is commonly noted in conversation. The magic of critical mass, 
in theory, is that if you get enough related items up in a commonly 
searchable database, then you have created a collection that is richer 
in its digital instantiation than in its analog original. This is premised 
on the notion that the technology has a transformative power — that 
it can not only re-create a collection online but also give it new func- 
tionality, allow for new purposes, and ultimately create new audi- 
ences that make it available for novel queries. It does this by, for ex- 
ample, turning static pages of text or numbers into a database. 
Monographs are no longer limited by the linear layout of the bound 
volume or microfilm reader. By making texts searchable, librarians 
can create new resources from old ones and transform items that 
have had little or no use into something that receives hundreds or 
thousands of hits. 

But how much is enough? A critical mass is enough to allow 
meaningful queries through curious juxtapositions and comparisons 
of phenomena, be it the occurrence of the word "chemise" in a run of 
Victorian novels or the U.S. Census returns from 1900. A large and 
comprehensive collection is valuable because it provides a context 
for interpretation. But in the digital realm, critical mass means some- 
thing quite new and as yet ill-defined. The most salient example of 
this phenomenon is the MO A database at the University of Michi- 
gan, which contains thousands of nineteenth-century imprints at risk 
of embrittlement or already embrittled. Staff report that although the 
books themselves were seldom called from the stacks, the MO A da- 
tabase is heavily used, and not exclusively by students and teachers 
of Michigan. Members of the University of Michigan community use 
MOA most heavily, but among its largest users is the Oxford Univer- 
sity Press, which mines the database for etymological and lexical re- 
search. Is this database heavily used because it is easily searched and 
the books were not? Because one can get access to it from any com- 
puter in any time zone, while the books were available to only a 
small number of credentialed users? Were the books of no research 
value when they languished in remote storage? And what is their 
research value now? It is hard to isolate a single factor that is deci- 
sive. Attempts to create equally valuable critical masses must ad- 
dress not only content but also search ability, ease of use, ranking on 
various search engines, and so forth. 

In the case of general collections, or imprints, one must select 
enough that they, taken together, create a coherent corpus. In one 
case, time periods may set the parameters; in another, it might be 
genre or subject. In many ways, what makes a digitized collection of 
print materials singularly useful is the ability to search across titles 
and within subjects. The more items in the collection, the more seren- 
dipitous the searching. But as JSTOR has shown, incremental in- 
crease in the number of titles in the corpus is possible because they 
already exist within a meaningful context and are available through 
a nimble search and retrieval protocol. 
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Critical mass could more accurately be thought of as "contextual 
mass/' that is, a variable quantity of materials that provides a context 
for evaluation and interpretation. In the analog realm, searching 
within a so-called critical mass has always been very labor-intensive. 
It has taken great human effort and patience to identify the relation- 
ships in and among items in a collection, and it has been possible 
only within collections that are physically located together. But once 
those items are online, in a form that is word-searchable, one has a 
mass that is accessible to machine searching, not the more arduous 
human researching. Theoretically, when many related collections ex- 
ist together on the Web, they create a significantly more meaningful 
source than these same collections would if not linked electronically. 
In reality, achieving such a contextual mass across institutional 
boundaries will remain elusive as long as the collections are not in- 
teroperable. 

For archival-type collections, which are not necessarily text 
based and are usually under looser bibliographical control than are 
published works, the amount of material needed to get a critical 
mass can defy the imagination, or at least challenge the budget. If a 
collection is too large to digitize — for example, a photo morgue or 
institutional records — staff may choose to digitize a portion that rep- 
resents the strengths of the collection. But what portion? How much 
is enough? These are subjective decisions, and they are answered dif- 
ferently by different libraries. In the public libraries, with no faculty 
to provide advice, the decisions have been made by the curatorial 
staff. The Library of Congress has called in scholarly consultants and 
educational experts from time to time to aid in selection decisions, 
but the curatorial staff always makes the actual decisions. The NYPL 
relies on a curatorial staff that is expert in a number of fields. Like 
most cultural heritage institutions, it has long corporate experience 
in selecting for exhibitions. Curatorial staff in academic special col- 
lections libraries often have the opportunity to work with faculty or 
visiting researchers who collaborate in shaping a digital collection 
and even add descriptive and interpretive text to accompany items. 

But many curators see digitizing anything less than a complete 
collection as "cherry picking," which results in a collection that does 
not support the research mission of the institution. Others are less 
severe and cheerfully admit that for most researchers, a little bit is 
better than nothing at all, and very few researchers mine any single 
collection to the depth that we are talking about. Those who do, they 
assert, would end up seeing the collection on site at some point in 
any event. These judgments are generalized from anecdotal experi- 
ences and are not based on objective data. When asked, for example, 
about how research techniques in special collections may be affected 
by digitization, some librarians said that research will be pursued by 
radically different strategies inside of a decade. Others think that re- 
search strategies for special collections materials will not change, 
even with the technology. The important thing, in their view, is not to 
get the resources online but to make tools for searching what is avail- 
able in libraries readily accessible on the Web — tools such as finding 
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aids. The NYPL has secured money to do long-term studies of users 
of digitized special collections to gather information about use and 
to test assumptions. More needs to be done. A significant portion of 
grant-funded digitization, especially that supported by federal and 
state funds, should include some meaningful form of user analysis. 

The California Digital Library (CDL) has inaugurated a project, 
called California Cultures, designed to make accessible a "critical 
mass of source materials to support research and teaching." Much of 
this documentation will reflect the social life, culture, and commerce 
of ethnic groups in California" (CDL 2000). The collection will com- 
prise about 18,000 images. The California Digital Library sees collab- 
oration as a key element in scalability. Because of funding and gover- 
nance issues, the CDL believes that it must foster a sense of 
ownership and responsibility for these collections among creators 
statewide, locality by locality. 

The role of scholars in selecting a defined set of contextually 
meaningful sources often works well for published items in certain 
disciplines. Agriculture and mathematics are examples where schol- 
ars have been able to come up with a list of so-called core literature 
that is amenable to comprehensive digitization. By way of contrast, 
curators may do a better job in selecting from special collections of 
unpublished materials — musical manuscripts, photo archives, per- 
sonal papers — than scholars do. These are materials with which only 
curatorial staff members are sufficiently familiar to make selection 
decisions. While there are exceptions to this rule, the sheer quantity 
of materials from which to select often makes the involvement of 
scholars in all decisions impractical and hence unscalable. 

Scholars tend to have a different concept of the critical mass than 
do librarians. Projects such as the Blake Archive and the Digital 
Scriptorium are built with the achievement of a critical mass for 
teaching and research in mind. Whereas a collection-driven, text- 
based program such as MOA can convert massive amounts of text 
and make it searchable, it can also put up materials without an inter- 
pretive framework. Other projects, such as APIS and, to a large de- 
gree, American Memory, invest time and money in creating interpre- 
tive frameworks and item-level descriptions that never existed when 
the items were analog, confined to the reading room, and served by 
knowledgeable staff. In many ways, this type of access is really a 
new form of publishing, not library service as it has been historically 
understood. 

3.3. Intellectual Control and Data Management 

The scarcity of cataloging or description that can be quickly and 
cheaply converted into metadata is often a decisive factor in exclud- 
ing a collection from digitization. Given that creating metadata is 
more expensive than is the actual scanning, it is necessary to take ad- 
vantage of existing metadata — that is, cataloging. Often, money to 
digitize comes with promises by library directors that they will put 
up several thousand — even million — images. This is a daunting 
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pledge. To mount five million images in five years, as LC pledged to 
do, has meant giving priority to large collections that already have 
extensive bibliographical controls. The NYPL is likewise giving se- 
lection preference to special collections that already have some form 
of cataloging that can be converted into metadata to meet production 
goals. In this way, expedience can theoretically be happily married to 
previous institutional investments. These libraries have put enor- 
mous resources into creating descriptions, exhibitions, finding aids, 
and published catalogs of prized institutional holdings. One can as- 
sume that a collection that has been exhibited or made the subject of 
a published illustrated catalog has demonstrated research and cul- 
tural value. 

Some collections that are supported by endowments can make 
the transition to digital access more easily than can others, because 
funds may be available for this within the terms of the gift. The 
Wallach Division of Art, Prints, and Photographs, for example, at the 
NYPL will be put online as the Digital Wallach Gallery. There are a 
number of grant applications that not only build the cost of metadata 
creation into the digitization project but also appear to be driven in 
part by a long-standing desire on the part of a library finally to get 
certain special collections under bibliographical control. 

It can be quite difficult, however, to harmonize descriptive prac- 
tices that were prevalent 40 years ago with what is required today. 
The expansive bibliographical essays that once were standard for de- 
scribing special collections need quite a bit of editing to make them 
into useful metadata. It is not simply a question of standards, which 
have always been problematic in special collections. The problem is 
that people research and read differently on the Web than they do 
when sitting with an illustrated catalog or finding aid at a reading 
desk. Descriptive practices need to be reconceptualized for presenta- 
tion online. This reconceptualization process is several years off, 
since we have as yet no basis for understanding how people use spe- 
cial collections online. 

For monographs and serials, genres for which the MARC record 
was originally devised and which is a standard well understood, re- 
tooling catalog records need not be complicated or expensive. For 
materials that are published but not primarily text based, such as 
photographs, posters, recorded speech, or musical interpretations, 
the MARC record has noted limitations and those tend to be accentu- 
ated in the online environment. Unpublished materials share this di- 
chotomy of descriptive practice between textual and nontextual. For 
institutions that have chosen to put their special collections online, 
tough decisions must be made about how much information can be 
created in the most cost-effective way. In some cases, rekeying or 
OCR can be used to produce a searchable text in lieu of creating sub- 
ject access. For handwritten documents, non-Roman scripts, and au- 
dio and visual resources, searching remains a problem. 

The context for interpretation needs to be far more explicit in the 
online environment than in the analog realm. It is interesting to think 
about why that is so. Are librarians creating too much descriptive 
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material for online presentation of those collections that have suc- 
cessfully been served in reading rooms with no such level of descrip- 
tion, or is it too little? Are librarians assuming that the level of so- 
phistication or patience of the online user is far lower than that of the 
onsite researcher? It is commonly assumed that an online patron will 
not use a source, no matter how valuable, if it is accompanied by 
minimal-level description. This assumption may be well founded in 
principle, and it is certainly true that the deeper and more structured 
the description, the likelier it is that the item will be found through 
the various searching protocols most in use. But by removing re- 
search collections from the context in which they have traditionally 
been used — the reading room — one also removes the reference staff 
who can guide the patron through the maze of retrieval and advise 
about related sources. Materials that the public has had little experi- 
ence using are now readily available online. If patrons lack certain 
research skills, the resources will remain inaccessible to them. 

The ease of finding digitized items on library Web sites varies. A 
few sites are constructed in a way that makes finding digitized col- 
lections almost impossible for people who do not already know they 
exist. Other sites have integrated the surrogates into the online cata- 
log and on OCLC or RLIN or both. Those DLF members whose pri- 
mary purpose in digitization is to increase access to special collec- 
tions and rare items have expressed willingness to expose the 
metadata for these collections to a harvester using a technical frame- 
work established for the Open Archives Initiative. 

3.4. Coordinated Collection Development 

The idea of coordinated collection development of digital objects is a 
powerful one. It motivated the Berkeley, Michigan, and Cornell li- 
braries to work together to mount several collections from their own 
holdings that could be termed part of one, the Making of America. 
Given the resources that must be dedicated to creating digital collec- 
tions and the resources it takes to build the infrastructure that allows 
access to them, it would seem that the only way to build truly scal- 
able collections is through some cooperative effort. 

But for all the talk of building federated collections that will ag- 
gregate into a digital library with depth and breadth — that is, critical 
mass — the principle of "states' rights" remains the standard. Each 
institution decides on its own what to digitize, and usually does so 
with little or no consultation with other libraries. There are funding 
sources that require collaboration in some circumstances — the Li- 
brary of Congress's Ameritech grant is an example — but the extent of 
collaboration usually has to do with using the same standards for 
scanning and, sometimes, description. Selection is not truly collabo- 
rative; it could more properly be characterized as "harmonized the- 
matically." Institutions usually make decisions based on their partic- 
ular needs rather than on community consensus about priorities. 

How do we know that we are not duplicating efforts in digitiza- 
tion, even when the content comes from special collections? Two 
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sheet music projects, the Levy Sheet Music Collection at Johns Hop- 
kins University and the Historical American Sheet Music Collections 
at Duke University may or may not have significant overlap, for ex- 
ample. There may be sound reasons to scan each collection in full, 
even if there is overlap. But we are not able to make that kind of de- 
cision at present without a comprehensive database about such 
projects. Some will argue that duplication is good and that dreams of 
standardization are premature. Nonetheless, while duplication may 
have some benefits at these early stages of research and develop- 
ment, we are unable to take advantage of these benefits unless we 
know what others are doing. A registry that would make core pieces 
of information available, not only about content but also about tech- 
nical specifications, disposition of originals, and cataloging prefer- 
ences, is a critical component of infrastructure that would allow each 
institution to make informed decisions on a series of matters (Digital 
Library Federation 2001). 

3.5. Funding 

Some library staff worry that institutional concerns, such as fund 
raising, public relations, and special projects, divert too many re- 
sources from more academically defensible projects or from the core 
mission of the library. When asked to take on projects conceived to 
advance the home institution's mission, rather than the research and 
teaching mission of the library per se, most library administrators 
seem willing to accept this "good citizen" role, and some use it to the 
library's distinct advantage. Even a vanity project, if managed prop- 
erly, will bring money into the library for digitization and provide 
the kind of training and hands-on experience that is necessary to de- 
velop digital library infrastructure and expertise. The key to building 
on such a project is to be sure that all the library's costs, not only 
scanning but also creating metadata, migrating files, and so forth, are 
covered. Such projects, done willingly and well, usually enhance the 
status of the library within the community and seldom do long-term 
harm. The issue bodes ill only when libraries deliberately seek fund- 
ing for things that are not core to their mission or when staff and 
management are diverted to support low-priority projects. Looked at 
from this perspective, outreach can properly be considered part of 
the library's mission. 

Academic libraries such as those at Virginia and Cornell, which 
are funded by a mix of private and public monies, are liable to face 
pressures to serve not only research and teaching needs but also state 
and regional interests. These need not be mutually exclusive, and 
even Harvard University has demonstrated its good citizenship by 
contributing items of interest to Cantabridgeans on its publicly ac- 
cessible Web site. The key lies in achieving a balance and, if possible, 
a synergy between the two. 

For public institutions, digital programs offer a new and unique 
way to serve collections to taxpayers. For example, online distribu- 
tion is the only way LC can provide access to its holdings in all con- 
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gressional districts. For the primary funders and governors, that is, 
members of the U.S. Congress, who have built and sustained the li- 
brary on behalf of their constituents, this rationale is compelling. 
Similarly, while the NYPL is not fully supported by public funds, its 
choices are nonetheless strongly influenced by the priorities of state 
and local governments. 

Academic libraries with dual funding streams — private and pub- 
lic — are most vulnerable when the state in which they are located 
expresses some expectation that the university library will mount 
materials from its collections that are aimed at the kindergarten 
through twelfth-grade user group. There is much talk of how access 
to primary source materials held in research institutions can trans- 
form K-12 education. This hypothesis needs to be tested. The fact 
that much grant funding is tied to K-12 interests has made it neces- 
sary for universities to try to shape research-level materials into a K- 
12 mold to secure funding, or at least to mask a research collection as 
one that is also suited for younger audiences. 

There is no doubt that public institutions are seen as holding a 
promise to improve the quality of civic life if they provide greater 
access to their holdings. The fact that the NYPL and LC have been so 
successful in securing funds from private citizens is a clear indication 
of the public esteem that these institutions enjoy and the desire to 
"get the treasures out." This level of philanthropy would be unthink- 
able in any other country. While these libraries may be accused of 
pandering to donors on occasion or of not paying enough attention 
to the academic community by digitizing materials that are not in 
demand first and foremost by scholars, the fact is that they are public 
libraries. Unlike the libraries in state universities, they are not de- 
signed to serve exclusively, or even primarily, the scholarly commu- 
nity. This obligation to serve the public, however, does not skew se- 
lection for digitization as drastically as some assert. Donors may 
express an interest in a particular type of material, but in the end, 
they choose from a set of candidate collections that have been pro- 
posed by curatorial divisions and vetted by preservation and digital 
library staff for technical fitness. In terms both of process and result, 
these candidate collections differ little from their private academic 
counterparts. 

In both public and private libraries, some curators who are ac- 
tive in special collection development advocate for digitization be- 
cause they see it as a way to induce further donations. For them, the 
promise of access is a useful collection-development tool because 
digital access advertises what the library collects and demonstrates a 
commitment to access. 

3.6. Preservation 

Much less has been written about how to plan for the access to and 
preservation of digitally reformatted collections over time than about 
how to select materials for digitization. This is partly because we 
know nothing certain about maintaining digital assets over the long 




31 



24 



Abby Smith 



haul. We have learned a great deal as a result of failed or deeply 
flawed efforts — those of the "We'll never do that again!" variety told 
of some projects to reformat information onto CDs, for example — but 
such lessons tend to be only informally communicated. Exceptions 
include the University of Michigan, where one library that has a 
clear view of what role digitization plays — that of collection manage- 
ment and preservation — has developed and published preservation 
policies that support those goals. The CDL is also an exception, per- 
haps because, as a central repository, establishing standards and best 
practices to which its contributors must adhere is paramount to 
building confidence as well as collections. Harvard University has 
published information about its plans for a digital repository, and 
Cornell has adopted policies for "perpetual care" of digital assets. 
The Library of Congress has also put online much about its planned 
audiovisual repository in Culpeper, and has announced a plan to de- 
velop a national digital preservation infrastructure (LC 2001). Gener- 
al information about the preservation of digital files can be found on 
the Web sites of the Cedars Project, CLIR, and Preserving Access to 
Digital Information (PADI). 

Nearly every library declares its intention to preserve the digital 
surrogates that it creates. The Library of Congress has also pledged 
to preserve those surrogates created by other libraries under the aus- 
pices of the National Digital Library Program (Arms 2000). In reality, 
however, many libraries have created digital surrogates for access 
purposes and have no strategic interest in maintaining them as care- 
fully as they would have if they had created those files to serve as 
replacements. Nonetheless, at this point libraries are uncomfortable 
admitting that they have a limited commitment to many of their sur- 
rogates, should push come to shove. Those who are creating surro- 
gates for access purposes alone still declare an interest in maintain- 
ing those surrogates as long as they can because the original 
investment in the creation of digital files has created something of 
enormous value to their patrons. Moreover, the cost of having to re- 
create those surrogates and the physical stress it might impose on the 
source materials argue for maintaining those files as long as possible. 

The mechanism for long-term management of digital surrogates 
is theoretically no different from that for management of born-digital 
assets. While refreshment and migration of digital collections have 
occurred in many libraries, the protocols and policies for preserva- 
tion are clearly still under development. Many libraries have been 
sensitized to the fact that loss can be simple and catastrophic, begin- 
ning with the wrong choice of (proprietary) hardware, software, or 
medium on which to encode information and ending with negligent 
management of metadata. The Y2K threat that libraries faced in 1999 
has led to systemic improvements in many cases. Not only did insti- 
tutions become aware of how deleterious it is to allow different soft- 
ware to proliferate but they also developed disaster-preparedness 
plans. Many received funds for infrastructure upgrades that might 
have been awarded much later, or not at all, were it not for the sense 
of urgency that the coming of Y2K provided. 
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Libraries are anticipating the day when they must develop strat- 
egies for handling digital objects that faculty are creating without the 
involvement of the library. These are the often-elaborate construc- 
tions done by individual scholars or groups of collaborators that the 
library hears about only after the critical choices of hardware, soft- 
ware, and metadata have been made, often by people wholly un- 
aware of the problems of long-term access to digital media. An in- 
creasing number of library managers have expressed concern about 
the digitized materials created by faculty that are "more than a Web 
site" yet less, often far less, than what the library would choose to 
accession and preserve. While libraries acknowledge that this is a 
growing problem, none has been forced to do much about it yet, and 
thoughts about how to deal with faculty projects are just now evolv- 
ing. Predictably, those that are collection-driven in approach are 
working to build a system for selecting what the library wishes to 
accession to its permanent collections. Cornell is developing criteria 
that individuals must work to if they expect that the library will pro- 
vide "perpetual care" (Cornell 2001). CDL already has such guide- 
lines in place. Michigan has a well-articulated preservation policy, 
one that is detailed enough to support the university's vision of digi- 
tal reformatting as a reliable long-term solution to the brittle book 
problem. 

3.7. Support of Users 

There is little understanding of how research library patrons use 
what has been created for them. Most libraries recognize that the col- 
lections they now offer online require a different type of user support 
than that which they have traditionally given visitors to their read- 
ing rooms. In many cases, user support has been developed for "dig- 
ital collections" or "digital resources," terms that almost invariably 
denote born-digital (licensed) materials. The Library of Congress, 
which targets a K-12 audience, has three reference librarians for its 
National Digital Library Program Learning Center. As a rule, howev- 
er, libraries have not been reallocating staff to deal specifically with 
digitized collections. Hit rates and analyses of Web transactions have 
yielded a great amount of quantitative data about access to digital 
surrogates, and those data have been mined for a number of internal 
purposes, from "demonstrating" how popular sites are to making 
general statements about how users are dialing in and from where. 
Qualitative analysis is harder to derive from these raw data, and 
there have been few in-depth analyses of how patrons are reacting to 
the added functionality and convenience of materials now online. 
Libraries have been keeping careful track of gate counts, for exam- 
ple, but when these counts go up or go down, what conclusion can 
one draw about the effect of online resources on use of on-site re- 
sources? 

One exception to this rule is the journal archiving service, JSTOR, 
which rigorously tracks the use of its resources. JSTOR analyzes its 
users' behavior because it needs to recover costs and hence must stay 
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closely attuned to demand, within the constraints of copyright is- 
sues. Looking ahead, close analysis of how researchers use specific 
online resources, especially how they do or do not contribute to the 
productivity of faculty and students, will be a prime interest of li- 
braries. Further work must begin to complement this work with 
analysis of free Web sources mounted by libraries. 

Most libraries report having classes and other instructional op- 
tions available for students and faculty. Some librarians report that 
instruction is not really necessary for undergraduates, who are quite 
used to looking first online, but that general orientations to library 
collections are needed more than ever. 

Good Web site design, that is, creating sites that are easily found, 
easily navigated, and readily comprehensible, is an often overlooked 
aspect of access. Within the first- gene ration libraries, there is an as- 
tonishing variety in the quality of Web site design. Even for fairly 
sophisticated users, finding a library's digital collections can involve 
going through a half-dozen screens. Having a professional design 
team that keeps a site up-to-date and constantly reviews it for im- 
provements does cost money, but considering that the Web site is the 
front door to the collections, it would seem penny wise and pound 
foolish to ignore design and marketing. 



4. Conclusions and Recommendations 



Digitization, like other collection-development strategies, works to 
the extent that it supports the mission of an institution. Without a 
clear sense of how digital projects may fit an institution's mission, it 
is difficult to build a strategy for sustaining them. It may still be too 
early for libraries to be thinking far into the future as far as digitiza- 
tion is concerned. This is a period of experimentation, of building 
skills and experience in staff, and of tracking one's own progress and 
that of others. In some libraries, the staff feels driven by the need to 
have some digital projects, whether or not library leaders have made 
clear what purpose such projects will ultimately serve. For the time 
being, modest strategies may be the best route. Simply declaring a 
project or series of projects as experiments and stating what goals 
can be achieved through those experiments will help begin to free 
staff for creative and reflective activity. Certain digitization ap- 
proaches work chiefly for large libraries and make little sense for 
small ones; others work for large public libraries but not large aca- 
demic ones. Not all strategies are scalable, and any digital projects 
that exist in splendid isolation from other parts of the home institu- 
tion or other libraries risk turning into a waste of resources. 

4.1. Costs 

A useful way of looking at the strategic value of digitized collections 
is to ask pointed questions that reflect perceived costs and benefits, 
such as the following: What would happen if these programs re- 
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ceived a dedicated allocation? Would the money for creating digital 
surrogates come from the acquisitions budget? From preservation? If 
the program were supported through a separate line, from which 
pocket of money would these funds be reallocated? Should fees be 
charged for access, at least in some cases? 

Michigan, which says that it left project-based digitization three 
years ago, has ongoing budget support for the staffing, equipment, 
and other infrastructure services such as servers and software re- 
sources for digital conversion production. The university anticipates 
that future projects will require additional funds to support special 
needs outside the core capacities or to handle a larger volume of 
work. Harvard's Digital Library Initiative is supported by internal 
funds, although its purpose is not collection development primarily 
but building infrastructure. 

At most libraries, digitization costs are covered by external 
funds, and the projects developed appeal to the intended funding 
source, be it a federal agency with stringent grant conditions; a pri- 
vate foundation that has a heuristic interest in projects; or donors 
and alumni, who usually contribute to the institution for eleemosy- 
nary purposes and often do so out of dedication to the institution 
and its mission per se. When asked about their priorities for selec- 
tion, many respondents remarked wryly that they digitize what they 
can get money to do, implying and even sometimes stating directly 
that their choices were skewed by donors' priorities and did not 
serve pure scholarship or other core missions of their institutions. 
However, what selectors, curators, and bibliographers think to be of 
highest value will also often differ from what the administration 
identifies, because they have differing views of where scholarship is 
moving, how sophisticated the users are, and what is of lasting im- 
port. 

Some librarians expressed great concern about the fact that, as 
long as libraries are competing for outside funds to digitize, they will 
be stuck in the entrepreneurial phase in which collection develop- 
ment is driven by strong personalities — those who are willing to 
compete for funds — and that some parts of the library's collections 
will go untapped simply because the subject specialist in that area is 
not the "entrepreneurial" type. Others express more serious concerns 
about the fate of non-English language materials and even greater 
anxiety about the neglect of non-Roman collections. 

Concerns about the changing role of library staff, especially of 
bibliographers, come up with increasing frequency. Staff become in- 
creasingly diverted from traditional collection-development duties to 
spend more time selecting for digitization — what might be called 
"reselecting." This is bound to have some effect on current collection 
development of traditional materials. A topic far more widely dis- 
cussed is where to find the skill sets that are needed for digital li- 
brary development. If libraries cannot afford to hire persons with 
such skills — as increasingly they cannot — how are they to go about 
developing them internally without robbing Peter to pay Paul? The 
same is true for preservation staff, who see themselves and their 
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funding diverted from preserving deteriorating collections to creat- 
ing digital versions of materials that are often not at imminent risk of 
deterioration. 

A number of special collections librarians have noted that, as 
items from the collections are digitized and given visibility, more 
people have become interested in the items and in related, not-yet- 
digitized materials. The numbers of on-site users and of phone, let- 
ter, and e-mail inquiries have risen. This places increased physical 
stress on original materials. This, in turn, increases the workload on 
staff, especially on preservation and reference staff. 

Reliable and meaningful cost data about digitization are rare and 
not often useful in comparative contexts. Costing out the elements of 
digitizing means beginning with selection and going to physical 
preparation, cataloging, physical capture, creation of metadata, 
mounting and managing files, designing and maintaining the site, 
providing additional user services, and going through to implement- 
ing a long-term preservation strategy. Virtually every step in digiti- 
zation involves human intervention and skill, and these costs, unlike 
those of storage, for example, are unlikely to go down. 



4.2. Benefits 

In an exercise not yet seen in the United States, Oxford University 
libraries recently looked at their experiences with digital conversion 
in an attempt to identify what benefits it had brought to the library 
and its patrons (Oxford 1999). Curiously, one of the chief benefits cit- 
ed was to curators, who learned an enormous amount about their 
own collections and those of other colleges. This seems an expensive 
way to break down barriers between departments and other library 
units or to make curators more familiar with their collections. But the 
report is alluding to one way in which the very role of bibliographer 
and curator is changing as a result of putting collections online. This 
is similar to reports from various campuses across the United States, 
which state that digital projects are crucial for staff training and de- 
veloping expertise. The matter of collection expertise, however, is not 
often mentioned; instead, reports about benefits focus on the other 
aspect of so-called e-curatorship, in which staff members develop 
technical and editorial or interpretive expertise. This does not always 
benefit the library or university in the end, because expert staff are 
difficult to retain and many managers complain that expenses in- 
curred in training a staff member served to benefit that individual's 
next employer. 

Benefits to users are also commonly cited, though as noted, sys- 
tematically gathered evidence of user satisfaction is rare. Managers 
also cite benefits to the collections through creation of surrogates that 
protect original materials while increasing access to the content. 
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4.3 Looking Ahead 

If it is to be sustainable over time, digitization must clearly be an in- 
tegral part of the core mission work of the library Whereas the ma- 
jority of research libraries engaged in digitization have been able to 
raise external funds for conversion, they all recognize the hazards of 
becoming too dependent on such funds. There is no such thing as a 
"free" building. Even if a donor were to pay for all aspects of the 
construction, from land acquisition to furnishing, at some point 
maintenance costs will become the responsibility of the home institu- 
tion, and the building must meet minimum criteria for support. 

The same holds true of digitized collections. Over the next few 
years, some libraries that have done digital projects will essentially 
phase them out; others will reduce this activity to the exception rath- 
er than the rule. Still others, committed to large-scale digital projects, 
either as a part of collection management or as a commitment to ex- 
tending access, will continue and will begin to address the tough 
questions of finding internal funds or developing fee-based services 
to support conversion, maintenance, and service. Whatever the sce- 
nario, libraries will need to articulate how digitization serves their 
core mission, be it outreach, preservation, access, or something else. 

If libraries continue to focus primarily on digitization of their special 
collections for access to primary sources (as opposed to discrete exhi- 
bitions for outreach purposes), they must act now to start building 
common infrastructures, such as a registry to obviate duplication 
and a commitment to rigorous cooperative collecting. One of the 
great potentials of the technology is to reunite collections that are 
dispersed and to create new meta collections (such as Blake's works) 
and allow several formats to interact. These are rather modest goals 
best achieved in well-defined projects, and a far cry from the rather 
common decisions now made to put up a large, interesting collection 
of archival or special collection materials. Without appropriate tools 
to search across collections and architectures that support interopera- 
bility, such collections will prove inconvenient to find and laborious 
to use. 

Looking back to the proceedings of a symposium on selection for 
digitization sponsored by the Research Libraries Group in 1995, we 
are confronted by similar concerns (RLG 1996). Among many factors 
identified at the meeting that remain valid today are the need for 
more focused selection of special collections, either with others to 
build a large cohesive body of materials or designed and scaled for 
classroom use; the need to engage collection development librarians 
in selection for reformatting the need to develop technical specifica- 
tions for a variety of things, from scanning to metadata creation, that 
can be widely embraced by institutions of different size and mission; 
the need to decide how to manage storage of surrogates and for how 
long the need to find outside funds to digitize and not be con- 
strained by that when selecting for conversion; and the need to iden- 
tify how digitization would add value to the source material, from 
uniting disparate collections to making their contents more searchable. 
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A principal barrier to developing freely accessible digital collec- 
tions of enduring value is the lack of sustainable economic models 
for digitizing collections, especially special collections. Each digitiza- 
tion program requires substantial investments of numerous kinds: 
processing and preserving source collections; creating interoperable 
systems for increased ease of access; copyright research and ongoing 
rights management models; and coordinated selection to increase the 
intellectual value of collections. On top of that are infrastructure ser- 
vices that need to be available for small or midsize institutions to 
keep the digital divide from deepening in the research community: 
digitizing and archiving services; a registry that would provide in- 
formation about what others are doing; and accepted practices for 
the capture and creation of metadata that can be used in good faith. 
The benchmarks for good collection development — rigorous selec- 
tion, accurate description, fitness-for-use, authenticity, and intellectu- 
al or cultural significance — do not change in the digital realm. It may 
be harder to know for certain how to build those collections online 
than in hard copy, but the core values of collections have not changed. 

4.4. Recommendations 

While it is too early to set standards or even memorialize best prac- 
tices for selection of collections for digital reformatting, libraries, 
funders, and research, development, and policy groups can take ac- 
tions that will minimize the chances that current investments will be 
wasted. These include taking deceptively simple steps in the course 
of decision-making, assessing progress and change, tracking expen- 
ditures across all phases of activity in a consistent manner, and en- 
gaging the widest possible group of people, from technologists to 
scholars, in selection, assessment, and planning. The following rec- 
ommendations come from a variety of sources in the field. All are 
based on experience and reflection about what has worked to date, 
and what next steps must be taken to advance the building of sus- 
tainable and valuable digital collections. 

Recommendations for Institutions 

• Be clear about the purpose of the project — for example, creation of 
preservation surrogates, outreach, or curricular development. 

• Begin with a clear understanding about whether or not the library 
will maintain the surrogates and for how long. 

• Develop clear protocols for selection decisions. 

• Develop cost projections for all aspects of digital conversion, from 
selection to refreshing of files. 

Either focus on materials that are already organized or target ma- 
terials not under intellectual control; include funding for process- 
ing and description in the project budget. 
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• Work with faculty and scholars to develop controlled vocabularies 
in those fields lacking them, to aid in creating metadata. 

• Clarify the target audience for a collection: do not attempt to be all 
things to all audiences. 

• Secure funding for and conduct user assessments of digitized col- 
lections and make the information from these assessments avail- 
able to the library community. 

• Consider making metadata available for harvesting through the 
Open Archives Initiative protocol. 

• Design a Web site that is graphically clear and easy to navigate; 
update it frequently as appropriate. 

• Develop a policy for the disposition of reformatted materials. 

Recommendations for Research Agenda and Agenda for 

Consortia and Funding Agencies 

• To build coherent collections, extend editorial board methodology, 
used for agriculture and mathematics, to select core literature in 
other disciplines. 

• Encourage funding agencies to support good practice by requiring 
minimal standards of capture, use of nonproprietary software, 
planning for maintenance of digital surrogates, and so forth. 

• Urge donors and consortia to engender partnerships for coopera- 
tive digitization of both general and special collections. 

• Examine the expectations of academic libraries' service to the 
broad public: what are the roles of municipal, state, and federally 
supported institutions for public access and those of private insti- 
tutions? 

• Conduct analysis of how digitized collections, both special and 
general, are used and by whom. 

• Foster the development of a registry of digitized collections that 
includes capture information, metadata standards, and disposi- 
tion of original source materials. 

• Foster the development of service bureaus that would offer scan- 
ning and archiving services. 

• Foster the development of shared print repositories. 
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